1 Introduction

This HTML notebook will replicate the STATA do file named vnm_mics14_dp2019 for distribution in R. The goal is to create an R script that does what the said do file does.

I will follow the same section numbering as in the do file for ease of comparision.

Goes without saying, all mistakes in this document are mine.

2 Libraries

Before we go into the excercise, following are the packages from which we will use various functions.

## here() starts at D:/R projects/OPHI/Vietnam Translation/Vietnam-MICS-MPI-to-R
## -- Attaching packages ---------------
## v ggplot2 3.3.0     v purrr   0.3.4
## v tibble  3.0.1     v dplyr   0.8.5
## v tidyr   1.0.3     v stringr 1.4.0
## v readr   1.3.1     v forcats 0.5.0
## Warning: package 'ggplot2' was built under R version 3.6.3
## Warning: package 'tibble' was built under R version 3.6.3
## Warning: package 'purrr' was built under R version 3.6.3
## Warning: package 'dplyr' was built under R version 3.6.3
## Warning: package 'forcats' was built under R version 3.6.3
## -- Conflicts ------------------------
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
## Warning: package 'janitor' was built under R version 3.6.3
## 
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test
## Warning: package 'gt' was built under R version 3.6.3
## 
## Attaching package: 'magrittr'
## The following object is masked from 'package:purrr':
## 
##     set_names
## The following object is masked from 'package:tidyr':
## 
##     extract

3 Building a Well-Being Index from Vietnam MICS 2013-14

The memory and environment clearing commands and the commands for setting working folders and paths are not needed if one is working in Project mode through RStudio and uses the {here} (read as package here). This is a library for managing paths and directories.

4 Vietnam MICS 2014

4.1 Step 1: Data Preparation

__ Selecting main variables from CH, WM, HH & MN recode & merging with HL recode __

It should be noted that anthropometric data was not collected for children under 5 as part of the Viet Nam MICS 2014 dataset. Previously, nutrition data was collected as part of Viet Nam MICS 2011. However, the data was not collected in this round due to time and resource constraints as well as the availability of national nutrition survey data (p.61)

Above comments are copied from the STATA do file

4.1.1 Step 1.1: CH - Children’s Recode (Under 5)

According to the STATA do file there is no data for this section.

4.1.2 Step 1.2: BH - Birth Recode (All females 15-49 years who ever gave birth)

The purpose of step 1.2 is to identify children of any age who died in the last 5 years prior to the survey date. As seen in the STATA file

Loading the data from bh.sav.

The above code chunk loads the bh.sav and names it bh_dat(a data object). The clean names function gets all the variable names in lower snake case. IN case if anyone is wondering what are all the possible cases, please refere to the wonderful art by Allison Horst shown below.

Various Cases (Let me know your favourite)

Various Cases (Let me know your favourite)

Now let us take a glimplse at the data and names of the variables.

##  [1] "hh1"       "hh2"       "ln"        "bhln"      "bh2"       "bh3"      
##  [7] "bh4m"      "bh4y"      "bh5"       "bh6"       "bh7"       "bh8"      
## [13] "bh9u"      "bh9n"      "bh10"      "bh4c"      "bh4f"      "bh9c"     
## [19] "bh9f"      "hh6"       "hh7"       "wdoi"      "wdob"      "ethnicity"
## [25] "welevel"   "brthord"   "magebrt"   "birthint"  "wmweight"  "wscore"   
## [31] "windex5"   "wscoreu"   "windex5u"  "wscorer"   "windex5r"  "windex2"
hh1 hh2 ln bhln bh2 bh3 bh4m bh4y bh5 bh6 bh7 bh8 bh9u bh9n bh10 bh4c bh4f bh9c bh9f hh6 hh7 wdoi wdob ethnicity welevel brthord magebrt birthint wmweight wscore windex5 wscoreu windex5u wscorer windex5r windex2
1 2 2 1 1 1 4 2011 1 2 1 4 NA NA NA 1336 1 NA NA 1 1 1368 1044 1 4 1 2 1 2.167035 1.640778 5 1.337577 5 NA NA 2
1 2 2 2 1 1 5 2013 1 0 1 5 NA NA 2 1361 1 NA NA 1 1 1368 1044 1 4 2 2 2 2.167035 1.640778 5 1.337577 5 NA NA 2
1 3 2 1 1 2 3 2003 1 10 1 4 NA NA NA 1239 1 NA NA 1 1 1368 939 1 3 1 2 1 2.167035 1.427797 5 1.071892 5 NA NA 2
1 3 2 2 1 1 6 2007 1 6 1 5 NA NA 2 1290 1 NA NA 1 1 1368 939 1 3 2 2 4 2.167035 1.427797 5 1.071892 5 NA NA 2
1 4 2 1 1 2 11 2007 1 6 1 4 NA NA NA 1295 1 NA NA 1 1 1368 1008 1 4 1 2 1 2.167035 1.613568 5 1.303632 5 NA NA 2
1 6 2 1 1 1 3 1995 1 18 1 3 NA NA NA 1143 1 NA NA 1 1 1368 788 1 4 1 2 1 2.167035 1.624926 5 1.317802 5 NA NA 2

Now let us create variables, only those that are non grouped, i.e. those generated without using bysort. In R one can create grouped varibales in a different manner. Will show that in the code chunk following this one.

Before moving forward, please note that, a large part of coding involved in creating this index is a data wrangling process. The tidyverse way of carrying out data wrangling extensively employs {dplyr} (read as ‘package dplyr’) and its mutate function. Here is a visual cue (created by Allison Horst) to what the mutate function does.

Wrangling and adding columns

Wrangling and adding columns

bh_dat <- bh_dat %>% 
  mutate(
    ind_id = structure(((hh1*100000)+(hh2*100)+ln),
                       label = "Individual ID"),
    date_death = structure(bh4c + bh9c, 
                           label = "Date of Death CMC"),
    mdead_survey = structure(ifelse(((bh9c == 0 |is.na(bh9c)) & bh5 == 1),
                                     NA, 
                                     wdoi - date_death), 
                             label = "Months Dead from Survey"),
    ydead_survey = structure(mdead_survey/12, 
                             label = "Years dead from Survey"),
    age_death = structure(ifelse(bh5 == 2, bh9c, NA),
                          label = "Age at death in Months"),
    child_died = structure(forcats::as_factor(
      ifelse(is.na(bh5),NA,
             ifelse(bh5==1,"child is alive","child is dead"))))
    ) %>% 
  group_by(ind_id) %>% 
  mutate(
    tot_child_died = sum(child_died == "child is dead", na.rm = T)
  ) %>% 
  ungroup() %>% 
  mutate(
    child18_died = forcats::as_factor(
      ifelse((!is.na(age_death) & age_death > 216),
             ifelse(is.na(bh5),NA,
                    ifelse(bh5==1,
                           "child is alive",
                           "child is dead")), 
             "child is alive"))
    ) %>% 
  group_by(ind_id) %>% 
  mutate(
    tot_child18_died_5y = sum(
      child18_died[ydead_survey <= 5 & !is.na(ydead_survey)] == "child is dead", na.rm = T)
  ) %>% 
  ungroup() %>% 
  mutate(
    tot_child18_died_5y = ifelse(
      (is.na(tot_child18_died_5y) & tot_child_died>=0 & !is.na(tot_child_died)),0,tot_child18_died_5y),
    tot_child18_died_5y = ifelse(
      child18_died == "child is dead" & is.na(ydead_survey), 
      NA, tot_child18_died_5y)
  ) %>% 
  group_by(ind_id) %>% 
  mutate(
    childu18_dies_per_wom_5y = structure(
      max(tot_child18_died_5y,na.rm = T),
      label = "Total child under 18 death for each women in the last 5 years (birth recode)")
  ) %>% 
  ungroup()

Check if all ind ids are uniuqe

## [1] FALSE

There are duplicate ind_ids. Let us remove duplicates.

Check again if all ind ids are uniuqe

## [1] TRUE

Creating Tables for new variables check.

age_death n percent valid_percent
0 111 0.0156316012 0.493333333
1 12 0.0016899028 0.053333333
2 6 0.0008449514 0.026666667
3 5 0.0007041262 0.022222222
4 1 0.0001408252 0.004444444
5 3 0.0004224757 0.013333333
6 7 0.0009857767 0.031111111
7 2 0.0002816505 0.008888889
8 4 0.0005633009 0.017777778
9 2 0.0002816505 0.008888889
11 2 0.0002816505 0.008888889
12 8 0.0011266019 0.035555556
14 3 0.0004224757 0.013333333
18 1 0.0001408252 0.004444444
23 2 0.0002816505 0.008888889
24 11 0.0015490776 0.048888889
36 11 0.0015490776 0.048888889
48 5 0.0007041262 0.022222222
72 5 0.0007041262 0.022222222
84 2 0.0002816505 0.008888889
96 1 0.0001408252 0.004444444
108 4 0.0005633009 0.017777778
120 1 0.0001408252 0.004444444
144 1 0.0001408252 0.004444444
168 1 0.0001408252 0.004444444
180 1 0.0001408252 0.004444444
192 2 0.0002816505 0.008888889
204 1 0.0001408252 0.004444444
216 2 0.0002816505 0.008888889
228 3 0.0004224757 0.013333333
240 2 0.0002816505 0.008888889
264 1 0.0001408252 0.004444444
276 1 0.0001408252 0.004444444
312 1 0.0001408252 0.004444444
NA 6876 0.9683143219 NA
bh5 child is alive child is dead
1 6876 0
2 0 225
child18_died n percent
child is alive 7093 0.998873398
child is dead 8 0.001126602
##  tot_child_died    0 1
##               0 6647 0
##               1  382 7
##               2   55 0
##               3    6 2
##               4    2 0

Writing new files.

4.1.3 Step 1.3: WM - Women’s Record (All eligible females 15-49 in the household)

Loading data and setting case for variable (column) names.

Mutating the data for creating new variables. This is what one would refere to generating new varibles in STATA, I think.

Note, for the varible marital or marital_wom, I miss to see whyn the STATA code defines it as marital to begin with and renames it to marital_wom. I will define this varibale as marital_wom from the begining itself. Kindly bring to notice if this is incorrect.

While creating new variables the STATA code has various tables. I will generate those here.

## [1] TRUE

This means all Individual Ids are unique, i.e., every row represents a unique individual.

cm1 1 2 NA_
1 454 6647 0
2 0 2726 0
NA 0 0 363
## # A tibble: 10 x 2
##                           mstatus       ma6
##                         <dbl+lbl> <dbl+lbl>
##  1 1 [Currently married/in union]        NA
##  2 1 [Currently married/in union]        NA
##  3 1 [Currently married/in union]        NA
##  4 3 [Never married/in union]            NA
##  5 3 [Never married/in union]            NA
##  6 3 [Never married/in union]            NA
##  7 3 [Never married/in union]            NA
##  8 1 [Currently married/in union]        NA
##  9 1 [Currently married/in union]        NA
## 10 3 [Never married/in union]            NA
mstatus 1 2 3 NA_
1 0 0 0 6972
2 207 193 107 0
3 0 0 0 2348
NA 0 0 0 363
marital_wom n percent valid_percent
Currently Married 6972 0.68420020 0.70947390
Never Married 2348 0.23042198 0.23893355
Widowed 207 0.02031403 0.02106441
Divorced 193 0.01894014 0.01963977
Seperated/Not living together 107 0.01050049 0.01088837
NA 363 0.03562316 NA
ma6 Currently Married Never Married Widowed Divorced Seperated/Not living together NA_
1 0 0 207 0 0 0
2 0 0 0 193 0 0
3 0 0 0 0 107 0
NA 6972 2348 0 0 0 363
mstatus Currently Married Never Married Widowed Divorced Seperated/Not living together NA_
1 6972 0 0 0 0 0
2 0 0 207 193 107 0
3 0 2348 0 0 0 0
NA 0 0 0 0 0 363

Keeping relevant columns.

Writing the tables in csv.

4.1.4 Step 1.4: MR - MEn’s Recode (All eligible men in the household)

As mentioned in the STATA code, Note: There is no male recode file for Viet Nam MICS 2014. Hence the commands under this section have been remove

4.1.7 Step 1.7: Data Merging

Unlike the STATA code the merging here will look a little different. It will be in a single flow. However, checks will be shown prior to mrunnig the merge command.

##  hl7     n      percent
##    0 29190 0.7412392077
##    1  1065 0.0270441849
##    2  4650 0.1180802438
##    3  2037 0.0517267649
##    4  1479 0.0375571356
##    5   541 0.0137379380
##    6   228 0.0057897410
##    7   110 0.0027932961
##    8    38 0.0009649568
##    9    27 0.0006856272
##   10     9 0.0002285424
##   11     3 0.0000761808
##   13     2 0.0000507872
##   15     1 0.0000253936
##  women_wm FALSE  TRUE
##         1     0 10190
##        NA 29190     0
## [1] wm7     n       percent
## <0 rows> (or 0-length row.names)

Writing this object for safe keeping.

4.1.8 Step 1.8: Control Variables

Comments from the STATA code:

Households are identified as having ‘no eligible’ members if there are no applicable population, that is, children 0-5 years, adult women 15-49 years or adult men. These households will not have information on relevant indicators of health. As such, these households are considered as non-deprived in those relevant indicators.

As one must have noticed from previous sections, I prefer not to gererate variables that are grouped in the same data frame. However, in the following sections I will follow as the code in STATA do file. This is because the code seems to mutate the same data frame starting from section Step 1.6.

Having said this, I should explain why I am averse to this practice. Consider the following example.

## # A tibble: 6 x 2
##       a b    
##   <dbl> <chr>
## 1     1 a    
## 2     2 b    
## 3     3 c    
## 4     4 a    
## 5     5 b    
## 6     6 c

Say, we want sum of column a for each type of observation in column b. We shall name this new variable c.

## # A tibble: 6 x 3
## # Groups:   b [3]
##       a b         c
##   <dbl> <chr> <dbl>
## 1     1 a         5
## 2     2 b         7
## 3     3 c         9
## 4     4 a         5
## 5     5 b         7
## 6     6 c         9

Above is the output we would generate if we wanted a sum of ‘a’ grouped by ‘b’. This might end up being misleading to someone who has neve seen the code but want to use the data. Some might do a simple average of cloumn c or try other summary statistics over this new column.

Moving on.

4.1.8.1 No Eligible Women 15-49 years

no_fem_eligible n percent
FALSE 33847 0.8594972
TRUE 5533 0.1405028

4.1.8.2 No Men Eligible

no_male_eligible n percent valid_percent
NA 39380 1 NA

4.1.8.3 NO Eligible Children 0-5 years

no_child_eligible n percent valid_percent
NA 39380 1 NA

4.1.8.4 No Eligible Women and Men

no_adults_eligible n percent valid_percent
NA 39380 1 NA

4.1.8.5 NO Eligible Children and Women

no_child_fem_eligible n percent valid_percent
NA 39380 1 NA

4.1.8.6 No Eligible Women, Men or Children

no_eligibles n percent valid_percent
NA 39380 1 NA

4.1.8.7 No Eligible Subsample

## # A tibble: 39,380 x 259
##    hh1.x hh2.x   hl1     hl3     hl4    hl5m  hl5y   hl6     hl7    hl7b
##    <dbl> <dbl> <dbl> <dbl+l> <dbl+l> <dbl+l> <dbl> <dbl> <dbl+l> <dbl+l>
##  1     1     1     1 1 [Hea~ 1 [Mal~  5       1964    49 0 [Not~ 0 [Not~
##  2     1     1     2 2 [Wif~ 2 [Fem~  6       1963    50 0 [Not~ 0 [Not~
##  3     1     1     3 3 [Son~ 1 [Mal~ 10       1998    15 0 [Not~ 0 [Not~
##  4     1     2     1 1 [Hea~ 2 [Fem~  9       1957    56 0 [Not~ 0 [Not~
##  5     1     2     2 3 [Son~ 2 [Fem~ 12       1986    27 2       0 [Not~
##  6     1     2     3 4 [Son~ 1 [Mal~ 11       1983    30 0 [Not~ 0 [Not~
##  7     1     2     4 5 [Gra~ 1 [Mal~  4       2011     2 0 [Not~ 4      
##  8     1     2     5 5 [Gra~ 1 [Mal~  5       2013     0 0 [Not~ 5      
##  9     1     3     1 1 [Hea~ 2 [Fem~ 98 [DK]  1943    70 0 [Not~ 0 [Not~
## 10     1     3     2 3 [Son~ 2 [Fem~  3       1978    35 2       0 [Not~
## # ... with 39,370 more rows, and 249 more variables: hl11 <dbl+lbl>,
## #   hl12 <dbl+lbl>, hl12a <dbl+lbl>, hl13 <dbl+lbl>, hl14 <dbl+lbl>,
## #   hl14a <dbl+lbl>, hl15 <dbl>, ed1 <dbl>, ed3 <dbl+lbl>, ed4a <dbl+lbl>,
## #   ed4b <dbl+lbl>, ed5 <dbl+lbl>, ed6a <dbl+lbl>, ed6b <dbl+lbl>,
## #   ed7 <dbl+lbl>, ed8a <dbl+lbl>, ed8b <dbl+lbl>, mp3 <dbl>, mp5 <dbl+lbl>,
## #   mp6 <dbl+lbl>, mp7 <dbl+lbl>, hh5d.x <dbl>, hh5m.x <dbl>, hh5y.x <dbl>,
## #   hh6.x <dbl+lbl>, hh7.x <dbl+lbl>, mline <dbl+lbl>, fline <dbl+lbl>,
## #   ethnicity.x <dbl+lbl>, helevel.x <dbl+lbl>, melevel <dbl+lbl>,
## #   felevel <dbl+lbl>, schage <dbl+lbl>, hhweight.x <dbl>, wscore.x <dbl>,
## #   windex5.x <dbl+lbl>, wscoreu.x <dbl>, windex5u.x <dbl+lbl>,
## #   wscorer.x <dbl>, windex5r.x <dbl+lbl>, hh_id <dbl>, ind_id <dbl>,
## #   hh1.y <dbl>, hh2.y <dbl>, ln <dbl>, bhln <dbl>, bh2 <dbl+lbl>,
## #   bh3 <dbl+lbl>, bh4m <dbl+lbl>, bh4y <dbl+lbl>, bh5 <dbl+lbl>,
## #   bh6 <dbl+lbl>, bh7 <dbl+lbl>, bh8 <dbl+lbl>, bh9u <dbl+lbl>,
## #   bh9n <dbl+lbl>, bh10 <dbl+lbl>, bh4c <dbl>, bh4f <dbl+lbl>, bh9c <dbl>,
## #   bh9f <dbl+lbl>, hh6.y <dbl+lbl>, hh7.y <dbl+lbl>, wdoi <dbl>, wdob <dbl>,
## #   ethnicity.y <dbl+lbl>, welevel <dbl+lbl>, brthord <dbl+lbl>,
## #   magebrt <dbl+lbl>, birthint <dbl+lbl>, wmweight <dbl>, wscore.y <dbl>,
## #   windex5.y <dbl+lbl>, wscoreu.y <dbl>, windex5u.y <dbl+lbl>,
## #   wscorer.y <dbl>, windex5r.y <dbl+lbl>, windex2 <dbl>, date_death <dbl>,
## #   mdead_survey <dbl>, ydead_survey <dbl>, age_death <dbl>, child_died <fct>,
## #   tot_child_died <int>, child18_died <fct>, tot_child18_died_5y <int>,
## #   childu18_dies_per_wom_5y <int>, wm7 <dbl+lbl>, cm1 <dbl+lbl>,
## #   cm8 <dbl+lbl>, cm9a <dbl>, cm9b <dbl>, women_wm <dbl>, marital_wom <fct>,
## #   hh1 <dbl>, hh2 <dbl>, hh3 <dbl>, hh4 <dbl>, hh5d.y <dbl>, hh5m.y <dbl>, ...

4.1.9 Step 1.9: Renaming Demographic Variables

merged_dat %>% 
  mutate(
    fem_eligible = ifelse(hl7 > 0 & !is.na(hl7), TRUE, NA), ## if hl7 is greater than 0, it cannot be NA, but I will stick to the STATA code for now
    ) %>% 
  group_by(hh_id) %>% 
  mutate(
    hh_n_fem_eligible = sum(fem_eligible, na.rm = T)
  ) %>% 
  ungroup() %>% 
  mutate(
    no_fem_eligible = structure(hh_n_fem_eligible == 0, label = "Household has no eligible women"),
    no_male_eligible = structure(NA, label = "Household has no eligible man"),
    no_child_eligible = structure(NA, label = "Household has no children eligible"),
    no_adults_eligible = structure(NA, 
                                label = "Household has no eligible women and men"),
    no_child_fem_eligible = structure(NA, 
                                label = "Household has no children or women eligible"),
    no_eligibles = structure(NA, 
                                label = "Household has no eligible women, men or children"),
    no_hem_eligible = structure(NA, label = "Household has no eligible individuals for hemoglobin measurement")
  ) %>% 
  select(-c("fem_eligible", "hh_n_fem_eligible")) %>% 
  mutate(
    weight = structure(hhweight.x, label = "Sample Weight"), # Sample Weight
    area = structure(ifelse(hh6.x == 2,0,hh6.x),
                     label = "Area: urban-rural",
                     labels = c("rural","urban")), #Area: urban or rural||factors in
                                                   #R start with 1, to maintain                                                        #consistency with stata code                                                        #creating these integers as                                                         #numerics with levels
    sex = structure(hl4,label = "Sex of Household member"), # Sex oh HH member
    age = structure(ifelse(hl6>=98,NA,hl6),
                    label = "Age of household member"), #Age of HH member
    agec7 = 
      structure(
        factor(
          case_when(
            age >= 0 & age <= 4 ~ "0-4",
            age >= 5 & age <= 9 ~ "5-9",
            age >= 10 & age <= 14 ~ "10-14",
            age >= 15 & age <= 17 ~ "15-17",
            age >= 18 & age <= 59 ~ "18-59",
            age >= 60 ~ "60+"
          ), levels = c("0-4","5-9","10-14","15-17","18-59","60+")
        ), label = "Age groups (7groups)"
      ), # Age group
    agec4 =
      structure(
        factor(
          case_when(
            age >= 0 & age <= 9 ~ "0-9",
            age >= 10 & age <= 17 ~ "10-17",
            age >= 18 & age <= 58 ~ "18-59",
            age >= 60 ~ "60+"
          ), levels = c("0-9","10-17","18-59","60+")
        ), label = "Age groups (4groups)"
      ), # Age group
    member = 1
  ) %>% 
  group_by(hh_id) %>% 
  mutate(
    hhsize = structure(sum(member), label = "Household size")
  ) %>% 
  ungroup() %>% 
  tabyl(hhsize, show_na = T) %>% 
  gt()
hhsize n percent
1 613 0.0155662773
2 2872 0.0729304215
3 5892 0.1496190960
4 11296 0.2868461148
5 7860 0.1995937024
6 5220 0.1325545962
7 2485 0.0631030980
8 1288 0.0327069578
9 765 0.0194261046
10 550 0.0139664804
11 231 0.0058659218
12 96 0.0024377857
13 104 0.0026409345
14 14 0.0003555104
15 30 0.0007618080
16 64 0.0016251905

Note that for varible agec7, there are only 6 groups but the label in STATA code says 7 groups. Following the STATA code.

merged_dat %>% 
  mutate(
    fem_eligible = ifelse(hl7 > 0 & !is.na(hl7), TRUE, NA), ## if hl7 is greater than 0, it cannot be NA, but I will stick to the STATA code for now
    ) %>% 
  group_by(hh_id) %>% 
  mutate(
    hh_n_fem_eligible = sum(fem_eligible, na.rm = T)
  ) %>% 
  ungroup() %>% 
  mutate(
    no_fem_eligible = structure(hh_n_fem_eligible == 0, label = "Household has no eligible women"),
    no_male_eligible = structure(NA, label = "Household has no eligible man"),
    no_child_eligible = structure(NA, label = "Household has no children eligible"),
    no_adults_eligible = structure(NA, 
                                label = "Household has no eligible women and men"),
    no_child_fem_eligible = structure(NA, 
                                label = "Household has no children or women eligible"),
    no_eligibles = structure(NA, 
                                label = "Household has no eligible women, men or children"),
    no_hem_eligible = structure(NA, label = "Household has no eligible individuals for hemoglobin measurement")
  ) %>% 
  select(-c("fem_eligible", "hh_n_fem_eligible")) %>% 
  mutate(
    weight = structure(hhweight.x, label = "Sample Weight"), # Sample Weight
    area = structure(ifelse(hh6.x == 2,0,hh6.x),
                     label = "Area: urban-rural",
                     labels = c("rural","urban")), #Area: urban or rural||factors in
                                                   #R start with 1, to maintain                                                        #consistency with stata code                                                        #creating these integers as                                                         #numerics with levels
    sex = structure(hl4,label = "Sex of Household member"), # Sex oh HH member
    age = structure(ifelse(hl6>=98,NA,hl6),
                    label = "Age of household member"), #Age of HH member
    agec7 = 
      structure(
        factor(
          case_when(
            age >= 0 & age <= 4 ~ "0-4",
            age >= 5 & age <= 9 ~ "5-9",
            age >= 10 & age <= 14 ~ "10-14",
            age >= 15 & age <= 17 ~ "15-17",
            age >= 18 & age <= 59 ~ "18-59",
            age >= 60 ~ "60+"
          ), levels = c("0-4","5-9","10-14","15-17","18-59","60+")
        ), label = "Age groups (7groups)"
      ), # Age group
    agec4 =
      structure(
        factor(
          case_when(
            age >= 0 & age <= 9 ~ "0-9",
            age >= 10 & age <= 17 ~ "10-17",
            age >= 18 & age <= 58 ~ "18-59",
            age >= 60 ~ "60+"
          ), levels = c("0-9","10-17","18-59","60+")
        ), label = "Age groups (4groups)"
      ), # Age group
    member = 1
  ) %>% 
  group_by(hh_id) %>% 
  mutate(
    hhsize = structure(sum(member), label = "Household size")
  ) %>% 
  ungroup() %>% 
  select(-c("member")) %>% 
  mutate(
    region = structure(forcats::fct_relevel(
      fct_recode(as_factor(hh7.x),
                 "North Central & Central Coastal Area" = 
                   "North Central and Central Coastal area",
                 "Northern Midlands & Mountain Area" = 
                   "Northern Midlands and Mountain area"), 
                                  c("Central Highlands",
                                    "Mekong River Delta",
                                    "North Central & Central Coastal Area",
                                    "Northern Midlands & Mountain Area",
                                    "Red River Delta",
                                    "South East")), 
                       label = "Region for Subnational Decomposition")
  ) %>% 
  tabyl(hh7.x, region) %>% 
  gt()
hh7.x Central Highlands Mekong River Delta North Central & Central Coastal Area Northern Midlands & Mountain Area Red River Delta South East
1 0 0 0 0 6198 0
2 0 0 0 6736 0 0
3 0 0 6134 0 0 0
4 7179 0 0 0 0 0
5 0 0 0 0 0 6748
6 0 6385 0 0 0 0

Notice how leveling of factors is significantly different in R. The {Forcats} is a very useful tool.

4.2 Step 2: Data preparation

Standardization of the 10 Global MPI Indicatiors.

Identification if non-deprived and deprived individuals.

4.2.1 Step 2.1: Years of Schooling

Comments from the STATA Code:

Note: In Viet Nam, children enter primary school aged 6 years, enter lower secondary school at 11 and upper secondary school at 15. There are grades in primary school (Grades 1 to 5), four in lower secondary school (Grades 6 to 9) and three in upper secondary school (Grades 10 to 12).(pg 192 report)

Let us first save the data frame created in the environment. This way we reduce the length of the code and copy pasting it again and again.

merged_dat %>% 
  mutate(
    fem_eligible = ifelse(hl7 > 0 & !is.na(hl7), TRUE, NA), ## if hl7 is greater than 0, it cannot be NA, but I will stick to the STATA code for now
    ) %>% 
  group_by(hh_id) %>% 
  mutate(
    hh_n_fem_eligible = sum(fem_eligible, na.rm = T)
  ) %>% 
  ungroup() %>% 
  mutate(
    no_fem_eligible = structure(hh_n_fem_eligible == 0, label = "Household has no eligible women"),
    no_male_eligible = structure(NA, label = "Household has no eligible man"),
    no_child_eligible = structure(NA, label = "Household has no children eligible"),
    no_adults_eligible = structure(NA, 
                                label = "Household has no eligible women and men"),
    no_child_fem_eligible = structure(NA, 
                                label = "Household has no children or women eligible"),
    no_eligibles = structure(NA, 
                                label = "Household has no eligible women, men or children"),
    no_hem_eligible = structure(NA, label = "Household has no eligible individuals for hemoglobin measurement")
  ) %>% 
  select(-c("fem_eligible", "hh_n_fem_eligible")) %>% 
  mutate(
    weight = structure(hhweight.x, label = "Sample Weight"), # Sample Weight
    area = structure(ifelse(hh6.x == 2,0,hh6.x),
                     label = "Area: urban-rural",
                     labels = c("rural","urban")), #Area: urban or rural||factors in
                                                   #R start with 1, to maintain                                                        #consistency with stata code                                                        #creating these integers as                                                         #numerics with levels
    sex = structure(hl4,label = "Sex of Household member"), # Sex oh HH member
    age = structure(ifelse(hl6>=98,NA,hl6),
                    label = "Age of household member"), #Age of HH member
    agec7 = 
      structure(
        factor(
          case_when(
            age >= 0 & age <= 4 ~ "0-4",
            age >= 5 & age <= 9 ~ "5-9",
            age >= 10 & age <= 14 ~ "10-14",
            age >= 15 & age <= 17 ~ "15-17",
            age >= 18 & age <= 59 ~ "18-59",
            age >= 60 ~ "60+"
          ), levels = c("0-4","5-9","10-14","15-17","18-59","60+")
        ), label = "Age groups (7groups)"
      ), # Age group
    agec4 =
      structure(
        factor(
          case_when(
            age >= 0 & age <= 9 ~ "0-9",
            age >= 10 & age <= 17 ~ "10-17",
            age >= 18 & age <= 58 ~ "18-59",
            age >= 60 ~ "60+"
          ), levels = c("0-9","10-17","18-59","60+")
        ), label = "Age groups (4groups)"
      ), # Age group
    member = 1
  ) %>% 
  group_by(hh_id) %>% 
  mutate(
    hhsize = structure(sum(member), label = "Household size")
  ) %>% 
  ungroup() %>% 
  select(-c("member")) %>% 
  mutate(
    region = structure(forcats::fct_relevel(
      fct_recode(as_factor(hh7.x),
                 "North Central & Central Coastal Area" = 
                   "North Central and Central Coastal area",
                 "Northern Midlands & Mountain Area" = 
                   "Northern Midlands and Mountain area"), 
                                  c("Central Highlands",
                                    "Mekong River Delta",
                                    "North Central & Central Coastal Area",
                                    "Northern Midlands & Mountain Area",
                                    "Red River Delta",
                                    "South East")), 
                       label = "Region for Subnational Decomposition")
  ) -> merged_dat

Table ed4b ed4a

ed4b 0 1 2 3 4 5 8 NA_
0 0 799 984 715 0 0 1 0
1 0 1062 0 0 0 0 0 0
2 0 1590 0 0 0 0 0 0
3 0 1848 0 0 0 0 0 0
4 0 1762 0 0 0 0 0 0
5 0 2145 0 0 0 0 0 0
6 0 0 1802 0 0 0 0 0
7 0 0 2270 0 0 0 0 0
8 0 0 1741 0 0 0 0 0
9 0 0 5350 0 0 0 0 0
10 0 0 0 1225 0 0 0 0
11 0 0 0 1008 0 0 0 0
12 0 0 0 3533 0 0 0 0
97 0 0 1 2 0 0 0 0
98 0 2 2 4 0 0 8 0
99 0 2 1 4 0 0 8 0
NA 734 0 0 0 1553 3677 0 5547

Table age ed6a when ed5 is 1

age 0 1 2 3 4 5 9
5 623 5 0 0 0 0 1
6 71 624 0 0 0 0 0
7 3 658 1 0 0 0 0
8 1 664 0 0 0 0 3
9 0 661 0 0 0 0 1
10 0 664 5 0 0 0 1
11 1 115 481 0 0 0 0
12 0 32 550 0 0 0 1
13 0 8 634 0 0 0 0
14 0 8 527 4 0 0 0
15 0 2 128 316 1 0 0
16 0 1 24 457 1 0 1
17 0 1 8 462 3 1 0
18 0 0 1 102 22 133 0
19 0 0 0 24 32 145 1
20 0 0 1 5 42 152 0
21 0 0 0 3 15 134 0
22 0 0 0 1 7 102 0
23 0 0 0 0 14 58 0
24 0 0 0 0 4 26 1

Creating edulevel and table ed4a, ed3

ed4a 1 2 NA_
0 734 0 0
1 9210 0 0
2 12151 0 0
3 6491 0 0
4 1553 0 0
5 3677 0 0
8 17 0 0
NA 0 2201 3346

Creating eduhighyear and table ed4b, ed43

ed4b 1 2 NA_
0 2499 0 0
1 1062 0 0
2 1590 0 0
3 1848 0 0
4 1762 0 0
5 2145 0 0
6 1802 0 0
7 2270 0 0
8 1741 0 0
9 5350 0 0
10 1225 0 0
11 1008 0 0
12 3533 0 0
97 3 0 0
98 16 0 0
99 15 0 0
NA 5964 2201 3346

Now, I feel it is as good a time as any, since we have moved this forward in to mutating new variables, to introduce the {magrittr} pipes. Up till now in the document you must have seen %>% appear everywhere. This is a pipe operator. What it does is to allow us to write code where the format is similar to f(g(y(x))), where f,g, and y are function, and x is input for y and y(x) is input of g and so on.

One must have noticed that if I am mutating multiple varibables of a data frame I ahve to reassign the mutated data frame with -> or <- operator. This can be redundant at times. Consider the following example.

let us say I want a new variable c = a*b in the same data frame ex_pipe and keep rows where b is 11. Below is code to do it with using pipe operator.

## # A tibble: 3 x 3
##       a     b     c
##   <dbl> <dbl> <dbl>
## 1     1    11    11
## 2     3    11    33
## 3     4    11    44

I might not want to use -> operator as it can get redundant. There is a cousin of the %>% operator that helps us tackle this problem. For instance, say i want ot add 1 to c in each row and update the data prame ex_pipe.

## # A tibble: 3 x 3
##       a     b     c
##   <dbl> <dbl> <dbl>
## 1     1    11    12
## 2     3    11    34
## 3     4    11    45

We will be using %<>% where necessary.

Ok. Back on track now.

updating merged_dat.

Table eduhighyear

eduhighyear n percent valid_percent
0 8046 0.20431691 0.24102810
1 1062 0.02696800 0.03181355
2 1590 0.04037583 0.04763046
3 1848 0.04692737 0.05535918
4 1762 0.04474352 0.05278294
5 2145 0.05446927 0.06425619
6 1802 0.04575927 0.05398119
7 2270 0.05764347 0.06800072
8 1741 0.04421026 0.05215386
9 5350 0.13585576 0.16026601
10 1225 0.03110716 0.03669642
11 1008 0.02559675 0.03019591
12 3533 0.08971559 0.10583548
NA 5998 0.15231082 NA

Cleaning inconsistencies.

Creating Years of schooling.

Checking for further inconsistencies.

eduyears 0 1 2 3 4 5 NA_
0 6281 2707 1 0 0 0 0
1 0 356 0 0 0 0 0
2 0 951 0 0 0 0 0
3 0 1298 0 0 0 0 0
4 0 1753 0 0 0 0 0
5 0 2145 987 0 0 0 0
6 0 0 1802 0 0 0 0
7 0 0 2270 0 0 0 0
8 0 0 1741 0 0 0 0
9 0 0 5350 725 0 0 0
10 0 0 0 1225 0 0 0
11 0 0 0 1008 0 0 0
12 0 0 0 3533 0 0 0
13 0 0 0 0 1553 3677 0
NA 0 0 0 0 0 0 17

Table ed4b age

ed4b 7
0 1

Creating Control variables regarding information on years of education.

no_missing_edu n percent
FALSE 8 0.0002031488
TRUE 39372 0.9997968512

Dropping temp, temp2 and hhs

4.2.1.1 Standard MPI

Comments from STATA code:

The entire household is considered deprived if no household member aged 10 years or older has completed SIX years of schooling.

hh_years_edu6 n percent
FALSE 3439 0.08732859
TRUE 35941 0.91267141

4.2.2 Step 2.2: Child School Attendance

Creating the attendence variable.

age 1 2 9 NA_
0 0 0 0 732
1 0 0 0 787
2 0 0 0 634
3 0 0 0 553
4 0 0 0 640
5 629 4 1 34
6 695 3 0 12
7 662 1 0 9
8 668 6 0 9
9 662 3 0 6
10 670 5 0 9
11 597 12 0 9
12 583 14 0 7
13 642 45 0 11
14 539 61 0 14
15 447 103 1 7
16 484 159 0 11
17 475 202 3 18
18 258 327 4 9
19 202 339 3 18
20 200 363 2 18
21 152 369 7 21
22 110 460 5 20
23 72 540 13 22
24 31 518 17 20
25 0 0 0 595
26 0 0 0 605
27 0 0 0 528
28 0 0 0 618
29 0 0 0 604
30 0 0 0 651
31 0 0 0 634
32 0 0 0 575
33 0 0 0 569
34 0 0 0 574
35 0 0 0 540
36 0 0 0 574
37 0 0 0 600
38 0 0 0 550
39 0 0 0 580
40 0 0 0 552
41 0 0 0 583
42 0 0 0 576
43 0 0 0 570
44 0 0 0 579
45 0 0 0 612
46 0 0 0 498
47 0 0 0 514
48 0 0 0 542
49 0 0 0 571
50 0 0 0 567
51 0 0 0 566
52 0 0 0 449
53 0 0 0 561
54 0 0 0 508
55 0 0 0 485
56 0 0 0 441
57 0 0 0 383
58 0 0 0 341
59 0 0 0 370
60 0 0 0 298
61 0 0 0 321
62 0 0 0 222
63 0 0 0 272
64 0 0 0 237
65 0 0 0 225
66 0 0 0 167
67 0 0 0 172
68 0 0 0 132
69 0 0 0 143
70 0 0 0 150
71 0 0 0 155
72 0 0 0 123
73 0 0 0 154
74 0 0 0 125
75 0 0 0 131
76 0 0 0 125
77 0 0 0 105
78 0 0 0 110
79 0 0 0 92
80 0 0 0 110
81 0 0 0 104
82 0 0 0 78
83 0 0 0 113
84 0 0 0 105
85 0 0 0 62
86 0 0 0 51
87 0 0 0 47
88 0 0 0 32
89 0 0 0 22
90 0 0 0 27
91 0 0 0 21
92 0 0 0 23
93 0 0 0 20
94 0 0 0 14
95 0 0 0 29
attendance n percent valid_percent
0 30546 0.775672930 0.7767775
1 8778 0.222905028 0.2232225
NA 56 0.001422042 NA

4.2.2.1 Standard MPI

Comments from STATA code:

The entire household is considered deprived if any school-aged child is not attending school up to class 8.

## # A tibble: 0 x 276
## # ... with 276 variables: hh1.x <dbl>, hh2.x <dbl>, hl1 <dbl>, hl3 <dbl+lbl>,
## #   hl4 <dbl+lbl>, hl5m <dbl+lbl>, hl5y <dbl+lbl>, hl6 <dbl+lbl>,
## #   hl7 <dbl+lbl>, hl7b <dbl+lbl>, hl11 <dbl+lbl>, hl12 <dbl+lbl>,
## #   hl12a <dbl+lbl>, hl13 <dbl+lbl>, hl14 <dbl+lbl>, hl14a <dbl+lbl>,
## #   hl15 <dbl>, ed1 <dbl>, ed3 <dbl+lbl>, ed4a <dbl+lbl>, ed4b <dbl+lbl>,
## #   ed5 <dbl+lbl>, ed6a <dbl+lbl>, ed6b <dbl+lbl>, ed7 <dbl+lbl>,
## #   ed8a <dbl+lbl>, ed8b <dbl+lbl>, mp3 <dbl>, mp5 <dbl+lbl>, mp6 <dbl+lbl>,
## #   mp7 <dbl+lbl>, hh5d.x <dbl>, hh5m.x <dbl>, hh5y.x <dbl>, hh6.x <dbl+lbl>,
## #   hh7.x <dbl+lbl>, mline <dbl+lbl>, fline <dbl+lbl>, ethnicity.x <dbl+lbl>,
## #   helevel.x <dbl+lbl>, melevel <dbl+lbl>, felevel <dbl+lbl>,
## #   schage <dbl+lbl>, hhweight.x <dbl>, wscore.x <dbl>, windex5.x <dbl+lbl>,
## #   wscoreu.x <dbl>, windex5u.x <dbl+lbl>, wscorer.x <dbl>,
## #   windex5r.x <dbl+lbl>, hh_id <dbl>, ind_id <dbl>, hh1.y <dbl>, hh2.y <dbl>,
## #   ln <dbl>, bhln <dbl>, bh2 <dbl+lbl>, bh3 <dbl+lbl>, bh4m <dbl+lbl>,
## #   bh4y <dbl+lbl>, bh5 <dbl+lbl>, bh6 <dbl+lbl>, bh7 <dbl+lbl>, bh8 <dbl+lbl>,
## #   bh9u <dbl+lbl>, bh9n <dbl+lbl>, bh10 <dbl+lbl>, bh4c <dbl>, bh4f <dbl+lbl>,
## #   bh9c <dbl>, bh9f <dbl+lbl>, hh6.y <dbl+lbl>, hh7.y <dbl+lbl>, wdoi <dbl>,
## #   wdob <dbl>, ethnicity.y <dbl+lbl>, welevel <dbl+lbl>, brthord <dbl+lbl>,
## #   magebrt <dbl+lbl>, birthint <dbl+lbl>, wmweight <dbl>, wscore.y <dbl>,
## #   windex5.y <dbl+lbl>, wscoreu.y <dbl>, windex5u.y <dbl+lbl>,
## #   wscorer.y <dbl>, windex5r.y <dbl+lbl>, windex2 <dbl>, date_death <dbl>,
## #   mdead_survey <dbl>, ydead_survey <dbl>, age_death <dbl>, child_died <fct>,
## #   tot_child_died <int>, child18_died <fct>, tot_child18_died_5y <int>,
## #   childu18_dies_per_wom_5y <int>, wm7 <dbl+lbl>, cm1 <dbl+lbl>,
## #   cm8 <dbl+lbl>, ...
no_missing_atten n percent
TRUE 39380 1

Notice there is a deviation from STATA code : replace no_missing_aten = no_missing_atten/hhs. This is because when both variables are 0 NaNs are generated.

hh_child_atten n percent
0 1186 0.03011681
1 38194 0.96988319

4.2.3 Step 2.3: Nutrition

Comments from STATA code:

Note: Anthropometric data was not collected for children under 5 as part of the Viet Nam MICS 2014 dataset.

4.2.4 Step 2.4: Chlid Mortality

child_mortality_f n percent valid_percent
0 36685 0.9315642458 0.9458553565
1 1714 0.0435246318 0.0441923424
2 310 0.0078720163 0.0079927807
3 54 0.0013712544 0.0013922908
4 22 0.0005586592 0.0005672296
NA 595 0.0151091925 NA
##  child_mortality     n      percent valid_percent
##                0 36685 0.9315642458  0.9458553565
##                1  1714 0.0435246318  0.0441923424
##                2   310 0.0078720163  0.0079927807
##                3    54 0.0013712544  0.0013922908
##                4    22 0.0005586592  0.0005672296
##               NA   595 0.0151091925            NA

4.2.4.1 Standard MPI

Comments from STATA code:

The standard MPI indicator takes a value of “0” if women in the household reported mortality among children under 18 in the last 5 years from the survey year. The indicator takes a value of “1” if eligible women within the household reported

(i) no child mortality or

(ii) if any child died longer than 5 years from the survey year or

(iii) if any child 18 years and older died in the last 5 years. Households were replaced with a value of “1” if eligible men within the household reported no child mortality in the absence of information from women. The indicator takes a missing value if there was missing information on reported death from eligible individuals.

Table for childu18_died_5y.

comments form the STATA code:

The ‘childu18_died_per_wom_5y’ variable was constructed in Step 1.2 using information from individual women who ever gave birth in the BH file. The missing values represent eligible woman who have never ever given birth and so are not present in the BR file. But these ‘missing women’ may be living in households where there are other women with child mortality information from the BH file. So at this stage, it is important that we aggregate the information that was obtained from the BH file at the household level. This ensures that women who were not present in the BH file is assigned with a value, following the information provided by other women in the household.

childu18_dies_per_wom_5y n percent valid_percent
0 7092 0.1800914170 0.998732573
1 9 0.0002285424 0.001267427
NA 32279 0.8196800406 NA

Assign 0 for wome who gave birth but did not report death

childu18_mortality_5y n percent
0 39336 0.998882682
1 44 0.001117318
hh_mortality_u18_5y n percent
FALSE 44 0.001117318
TRUE 39336 0.998882682

4.2.5 Step 2.5: Electricity

Comments from STATA:

Members of the household are considered deprived if the household has no electricity

4.2.6 Step 2.6: Sanitation

Comments from STATA Code:

Improved sanitation facilities include flush or pour flush toilets to sewer systems, septic tanks or pit latrines, ventilated improved pit latrines, pit latrines with a slab, and composting toilets. These facilities are only considered improved if it is private, that is, it is not shared with other households. Source: https://unstats.un.org/sdgs/metadata/files/Metadata-06-02-01.pdf Note: In cases of mismatch between the country report and the internationally agreed guideline, we followed the report.

Creating new variables

4.2.6.1 Standard MPI

Comments from STATA code:

Members of the household are considered deprived if the household’s sanitation facility is not improved (according to the SDG guideline) or it is improved but shared with other households

toilet_mdg n percent
FALSE 8687 0.2205942
TRUE 30693 0.7794058

4.2.7 Step 2.7: Drinking Water

Comments from the STATA Code:

Improved drinking water sources include the following: piped water into dwelling, yard or plot; public taps or standpipes; boreholes or tubewells; protected dug wells; protected springs; packaged water; delivered water and rainwater which is located on premises or is less than a 30-minute walk from home roundtrip. Source: https://unstats.un.org/sdgs/metadata/files/Metadata-06-01-01.pdf Note: In cases of mismatch between the country report and the internationally agreed guideline, we followed the report.

ws2 n percent
11 2940 0.4577300327
12 126 0.0196170014
13 55 0.0085629768
14 4 0.0006227619
21 2076 0.3232134517
31 651 0.1013545072
32 79 0.0122995485
41 80 0.0124552390
42 4 0.0006227619
51 110 0.0171259536
81 244 0.0379884789
96 54 0.0084072863

comments from STATA code:

Because the quality of bottled water is not known, households using bottled water for drinking are classified as using an improved or unimproved source according to their water source for non-drinking activities such as cooking and hand washing. However, it is important to note that households using bottled water for drinking are classified as unimproved source if this is explicitly mentioned in the country report.

4.2.7.1 Standard MPI

Comments from STATA code:

Members of the household are considered deprived if the household does not have access to improved drinking water (according to the SDG guideline) or safe drinking water is at least a 30-minute walk from home, roundtrip

water_mdg n percent
0 3356 0.08522092
1 36024 0.91477908

4.2.8 Step 2.8: Housing

Comments from STATA code:

Members of the household are considered deprived if the household has a dirt, sand or dung floor

floor 0 1
11 1664 0
21 0 836
22 0 541
31 0 929
32 0 17
33 0 2509
34 0 7332
35 0 29
36 0 25523

Comments from STATA:

Members of the household are considered deprived if the household has wall made of natural or rudimentary materials

wall 0 1
11 42 0
12 1329 0
13 56 0
14 21 0
21 424 0
22 132 0
23 40 0
24 342 0
25 28 0
26 282 0
31 0 21
32 0 78
33 0 30817
34 0 1152
35 0 154
36 0 3309
96 1153 0

Comments from STATA code:

Members of the household are considered deprived if the household has roof made of natural or rudimentary materials

roof 0 1
12 749 0
22 180 0
23 58 0
31 0 17452
33 0 3708
34 0 7779
35 0 9421
37 0 30
38 0 3

4.2.8.1 Standard MPI

Comments from the STATA Code:

housing_1 n percent
0 4793 0.1217115
1 34587 0.8782885

4.2.10 Step 2.10: Assets Ownership

Comments from STATA code:

Assets that are included in the global MPI: Radio, TV, telephone, bicycle, motorbike, refrigerator, car, computer and animal cart

Creating new variables

Grouping telephone and mobiletelephone as a single variable

4.2.14 Setting Dimension weights

Follwoing are the Weights for each dimension:

Dimensions Dimensional_Weights
Assets 0.0555556
Cooking Fuel 0.0555556
Housing 0.0555556
Sanitation 0.0555556
Water 0.0555556
Electricity 0.0555556
Education 0.1666667
School Attendance 0.1666667
Child Mortality 0.3333333

4.2.16 Counting Vector

c_vector n percent
0.00000000 19920 0.5058405282
0.05555556 8265 0.2098781107
0.11111111 4045 0.1027171153
0.16666667 3013 0.0765109192
0.22222222 1182 0.0300152362
0.27777778 1051 0.0266886745
0.33333333 968 0.0245810056
0.38888889 464 0.0117826308
0.44444444 162 0.0041137633
0.50000000 219 0.0055611986
0.55555556 58 0.0014728288
0.61111111 30 0.0007618080
0.66666667 3 0.0000761808

4.2.19 M, H and A for all cut offs (1/3, .20, .50)

Comments from STATA code:

By sumarizing (obtaining the mean) of the identification vector, the individual deprivation share,

and the individual censored c vector at any level of k we will obtain the Multidimensional Headcount

Ratio (H), the Intensity of Poverty among the Poor (A), and the Adjusted Headcount Ratio (M0), respectively.

4.2.19.1 At k = 1/3

Headcount

## [1] 0.04322876

At k = 1/3 , 4.3% of the peopel are poor (multidimensionally deprived).

Intensity

## # A tibble: 1 x 1
##   Intensity
##       <dbl>
## 1     0.381

At k = 1/3, the average Intensity of poverty is 38.1%

MPI

M = A * I

can also be calculated using the mean of the cens_c_vector at k = 1/3

## # A tibble: 1 x 1
##        M
##    <dbl>
## 1 0.0165

Now using the product of H and A

## [1] 0.01648832